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ABSTRACT. In a Bayesian approach for solving linear inverse problems one needs to specify the prior 
laws for calculation of the posterior law. A cost function can also be defined in order to have a common tool 
for various Bayesian estimators which depend on the data and the hyperparameters. The Gaussian case 
excepted, these estimators are not linear and so depend on the scale of the measurements. In this paper a 
weaker property than linearity is imposed on the Bayesian estimator, namely the scale invariance property 
(SIP). 

First, we state some results on linear estimation and then we introduce and justify a scale invariance 
axiom. We show that arbitrary choice of scale measurement can be avoided if the estimator has this SIP. 
Some examples of classical regularization procedures are shown to be scale invariant. Then we investigate 
general conditions on classes of Bayesian estimators which satisfy this SIP, as well as their consequences 
on the cost function and prior laws. We also show that classical methods for hyperparameters estimation 
{i.e., Maximum Likelihood and Generalized Maximum Likelihood) can be introduced for hyperparameters 
estimation, and we verify the SIP property for them. 

Finally we discuss how to choose the prior laws to obtain scale invariant Bayesian estimators. For this, 
we consider two cases of prior laws : entropic prior laws and first-order Markov models. In related preceding 
works ^, the SIP constraints have been studied for the case of entropic prior laws. In this paper extension 
to the case of first-order Markov models is provided. 

KEY WORDS : Bayesian estimation. Scale invariance, Markov modelling. Inverse Problems, Image re- 
construction, Prior model selection 

1. Introduction 

Linear inverse problem is a common framework for many different objectives, such as reconstruction, 
restoration, or deconvolution of images arising in various applied areas The problem is to 
estimate an object x which is indirectly observed through a linear operator A, and is therefore 
noisy. We choose explicitly this linear model because its simplicity captures many of interesting 
features of more complex models without their computational complexity. Such a degradation 
models allows the following description: 

y = Ax + b, (1) 

where b includes both the modeling errors and unavoidable noise of any physical observation system, 
and A represents the indirect observing system and depends on a particular application. For exam- 
ple, A can be diagonal or block-diagonal in deblurring, Toeplitz or bloc-Toeplitz in deconvolution, 
or have no special interesting form as in X-ray tomography. 

In order to solve these problems, one may choose to minimize the quadratic residual error 
\\y — Ax\\'^. That leads to the classical linear system 

A* Ax = A^y. (2) 

When mathematically exact solutions exist, they are too sensitive to unavoidable noise and so are 
not of practical interest. This fact is due to a very high condition number of v4 In order to have 
a solution of interest, we must mathematically qualify admissible solutions. 
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The Bayesian framework is well suited for this kind of problem because it could combine infor- 
mation from data y and prior knowledge on the solution. One needs then to specify the prior laws 
Px{x] A) and Pb(y — Ax] xl^) for calculation of the posterior Px\y{x\y) oc px{x)pi,{y — Ax) with the 
Bayes rules. Most of the classical Bayesian estimators, e.g., Maximum a posteriori (MAP), Posterior 
Mean (PM) and Marginal MAP (MMAP), can be studied using the common tool of defining a cost 
function C{x* , x) for each of them. It leads to the classical Bayesian estimator 



depending both on data y and hyperparameters 6. 

Choosing a prior model is a difficult task. This prior model would include our prior knowledge. 
Some criteria based on information theory and maximum entropy principle, have been used for that. 
For example, when our prior knowledge are the moments of the image to be restored, application of 
maximum entropy principle leads Djafari & Demoment Q to exact determination of the prior, 
including its parameters. Knowledge of the bounds (a gabarit) and the choice of a reference measure 
leads LeBesnerais ||, ||] to the construction of a model accounting for human shaped prior in the 
context of astronomic deconvolution. 

We consider the case when there is no important and quantitative prior information such as 
the knowledge of moment or bounds of the solution. Then we propose to reduce the arbitrariness 
of the choice of prior model by application of constraint to the resulting Bayesian estimator. The 
major constraint for the estimator is to be scale invariant, that is, whichever the scale or physi- 
cal unit we choose, estimation results must be identical. This desirable property will reduce the 
possible choice for prior models and make it independent of the unavoidable scale choice. In this 
sense, related works of Jaynes or Box & TiAO |^ on non-informative prior are close to our 
statement, although in these works the ignorance is not limited to the measurement scale. In our 
work, qualitative information only is supposed to be known (positivity excepted), so we think of 
choosing a parametric family of probability laws as a usual and natural way in accounting for the 
prior. The parameters estimation in the chosen family of laws will be done according to the data, 
with a Maximun Likelihood (ML) or the Generalized Maximum Likelihood (GML) approach. These 
approaches are shown in this paper to be scale invariant. 

One can criticize choosing the prior law from a desired property of the final estimator rather 
than from the available prior knowledge. We do not maintain having exactly chosen a model but 
just restricting the available choice. Then Gaussian or convex prior popularity is due likely to the 
tractability of the associated estimator rather than Gaussianity or convexity of the modeling process. 
Lastly, good as the model is, its use depends on the tradeoff between the good behavior of the final 
estimator and the quality of estimation. 

The paper is organized as follows. First, we state some known results on Gaussian estimators as 
well as introduce and justify the imposition of scale invariance property (SIP) onto the estimator. 
This will be done in section 2 with various examples of scale invariant models. In section 3 we prove 
a general theorem for a Bayesian estimator to be scale invariant. This theorem states a sufficient 
condition on the prior laws which can be used for reducing the choice to admissible priors. For this, 
we consider two cases of prior laws : entropic prior laws and first-order Markov models. In related 
preceding works |0, ||, the SIP constraints has been studied for the case of entropic prior laws. In 
this paper we extend that work to the case of first-order Markov models. 

2. Linearity and scale invariance property 

In order to better understand the scale invariance property (SIP), in the next subsection we consider 
in detail the classical case of linear estimators. First, let us define linearity as combination of 
additivity: 



x{y,e) = argnnn{Ea;.|y {C{x*,x)\y}} 



(3) 
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and the scale invariance property (SIP): 

Vy, y ^ X =^ \fk, ky i-^ kx. (5) 

Linearity includes the SIP and so is a stronger property. We show a particular case how the SIP is 
satisfied in these linear models. 

2.1. Linearity and Gaussian assumptions 

Linear estimators under Gaussian assumptions have been (and probably still arc) the most studied 
Bayesian estimators because they lead to an explicit estimation formula. In a similar way their 
practical interest is due to their easy implementation, such as Kalman filtering. In all these cases, 
prior laws have the following form: 

p^{x) oc exp (^-^{x - m^y'E~^{x - m^)^ , (6) 

whereas the conditional additive noise is often a zero mean Gaussian process Af{0, S;,). 

Minimization of the posterior likelihood for all the three classical cost functions MAP, PM and 
MMAP is the same as those of a qiiadratic form. It leads to the general form of the solution: 

X = (A*S,-iA + S;ij)-i(A*S,-iy + S;im,) (7) 

which is a linear estimator. 
Some particular cases follow: 

• Case where = and = a^I. This can be interpreted as degenerated uniform prior of 
the solution. The solution is the minimum variance one and is rarely suitable due to the high 
condition number of A. 

• Case where S;, = a^I and S^, = cr^J. This leads to the classical Gaussian inversion formula: 

x = {A*A + iiI)-^iA*y + fj,m^), with ii = al/al, (8) 

The Signal-to-noise ratio (SNR) fi = cr^/crl appears explicitly and serves as a scale invariant 
parameter. It plays therefore the meaningful role of a hyperparametcr. 

• The Gauss- Markov regularization case, which considers a smooth prior of the solution, is specified 
by setting S"^ = fj,D*D + cr~'^I, with D a discrete difference matrix. 

For all these cases, estimate x depends on a scale. Let us look at the dependence. For that 

matter, suppose that we change the measurement scale. For example, if both x and y are optic 
images where each pixel represents the illumination (in Lumen) onto the surface of an optical device, 
we measure the number of photons coming into this device. (This could be of practical interest for 
X-ray tomography.) Then we convert y into the new chosen scale and simultaneously update our 
parameters Yi^ , S5 and m,x ■ Estimation formula is then given by 

Xk = (A'k-'^-E^^A + k-^T,-^I)-^{A'k-^i:i^kv + k-^H-^km^), (9) 

or, canceling the scale factor k: 

Xk = kiA'^l^A + i:-'l)-\A'Y^'y + S,-im,). (10) 

Thus, if we take care of hyperparameters, the two restored images arc physically the same. 

This property is rarely stated in the Gaussian case, which can be explained by the use of SNR 
as a major tool of reasoning. Thus if we set the SNR, then Xk and kx are equal. 

In many cases Gaussian assumptions are fulfilled, often leading to fast algorithms for calculating 
the resulting linear estimator. We focus on the case where Gaussian assumptions are too strong. 
It is the case when Gauss-Markov models are used, leading to smoother restoration than wanted. 
It might be explained by the short probability distribution tails which make discontinuity rare and 
which prevent appearing of wide homogeneous areas into the restored image. 
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2.2. Scale invariance basics 

Although the particular case considered above may appear obvious, it is at the base of the scale 
invariance axiom. In order to estimate or to compare physical parameters, we must choose a scale 
measurement. This can have a physical meaningful unit or only a grey-level scale in computerized 
optics. Anyway we have to keep in mind that a physical unit or scale is just a practical but arbitrary 
tool, both common and convenient. As a consequence of this remark we state the following axiom 
of scale invariance: 



Estimation results must not depend on the arbitrary choice of the scale measurement. 

This is true when scale measurement depends on time exposure (astronomic observations, Positron 
emission tomography, X-ray tomography, etc.). Estimation results with two different values of time 
exposure must be coherent. SIP is also of practical interest when exhaustive tests are required for 
the validation. 

Let us have a look on some regularized criteria for Bayesian estimation. In all the cases, the 
MAP criterion is used, and the estimators take the following form: 

x{y; ■0, A) = argmin {- \ogpb{y - Ax; tp) - logp^(a;; A)} . (11) 



Lp— norm estimators: General form of those criteria involves an Lp-norm rather than a quadratic 
norm. Then, the noise models and prior models take the following form: 

Pb{y - Ax^ifj) •xexpl'ipWy - Ax\\p] (12) 

and 

Pa;(a;; A) oc exp [A||Ma;||,] , (13) 

where M can be a difference matrix as used by Bouman & Sauer and Besag on the Generalized 
Gauss-Markov Models and Li-Markov models ||l^. Finally, with q = 1 and M an identity 
matrix it leads to a Li-deconvolution algorithm in the context of seismic dcconvolution [TT| . 

According to the scale transformation x ^ kx and y ^ ky, the models change in the following 
way: 

Pbiky - Akx; tp) oc exp [FVlly - Ax\\p] (14) 

and 

p^(fca;;A) cxexp[fc«A||M£c|l,]. (15) 

If we set (ipki^k) = (fc^V'j ^'A), the two estimates are scale invariant. Moreover, if p = q, we can 
drop the scale k in the MAP criteria (eg. |77|) which becomes scale invariant. This is done in |^ p^ , 
but it makes the choice of the prior and the noise models mutually dependent. We can also remark 
that t/i'^/A^ is scale invariant and can be interpreted as a generalized SNR. 

MsLximum Entropy methods: Maximum Entropy reconstruction methods have been extensively 
used in the last decade. A helpful property of these methods is positivity of the restored image. 
In these methods, the noise is considered zero-mean Gaussian A/'(0, Eti), while the Log- prior take 
different forms which look like an "Entropy measure" of BuRG or Shannon. Three different forms 
which have been used in practical problems are considered below. 

• First, in a Fourier synthesis problem, Wernecke & D'Addario [|i2| used the following form: 



Pxix; A) cx exp 



-A^log: 



(16) 
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Changing the scale in this context just modifies the partition function which is not important 
in the MAP criterion (eg. |71|). As the noise is considered Gaussian, these authors show that if 
we update the A parameter in a proper way (A^ = fc^A), then the ME reconstruction maintain 
linearity with respect to the measurement scale k. Thus, this ME solution is scale invariant, 
although nonlinear. 



In image restoration, BuRCH & al. , consider a prior law of the form 



Px{x; A) oc exp 
Applying our scale changing yields: 



-A^ log 



Px{kx; A) cx exp 



-A k Xi log Xi + k log k : 



(17) 



(18) 



which does not satisfy the scale invariance property due to the k log k ^ - Xi term. It appears from 
their later papers that they introduced a data pre-scaling before the reconstruction. Then, the 
modified version of their entropy becomes 



Pxix; A) cx exp 



-A 



(19) 



where s is the pre-scaling parameter. 



• Modification of the above expression with natural parameters for exponential family leads to the 
"entropic laws" used later by Gull & Skilling. 1i4| and Djafari 111]: 



Pxix; A) cx exp 



-Al ^ logxi - A2 ^ : 



(20) 



The resulting estimator is scale invariant for the reasons stated above. 



Markovian models: A new Markovian model [T^l has appeared from /-divergence considerations 
on small translation of an image in the context of astronomic deconvolution. This model can be 
rewritten as Gibbs distribution in the following form: 



Px{x; A) cx exp 



-A ^ {xs - Xr) log 

(s,r)GC 



If we change the scale of the measurement, the scale factor k vanishes in the logarithm, and 



Px{kx; A) cx exp 



-fcA ^ {xs - Xr) log 

{s,r)eC 



(21) 



(22) 



Thus this particular Markov random field leads to a scale invariant estimator if we update the 
parameter A according to Acb constant (the noise is assumed Gaussian- independent). In the same 
way as in the Lp norm example. Act;, can be considered as a generalized SNR. 

These examples show that the family of scale invariant laws is not a duck-billed platypus family. 
It includes many already employed priors on the context of image estimation. We have shown in a 
related work that other scale invariant prior laws exist, both in the Markovian prior family ||l7|| and 
in the uncorrelated prior family. 
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Scheme 1: Global scale invariance property for an estimator 



3. Scale invariant Bayesian estimator 

Before further developing the scale invariance constraint for the estimator, we want to emphasize the 
role of the hypcrparamctcrs 6 {i.e., parameters of the prior laws) and to sketch their estimation from 
the data which is very important in real-world applications. The estimation problem is considered 
globally. By globally we mean that, although we are interested on the estimation of x we want 
also to take into account the estimation of the hyperparameters 6. To summarize the SIP of an 
estimator, we illustrate it by the following scheme: 

For more detail, let us define a scale invariant estimator in the following way: 

Definition 1 An estimator x{y; 9) is said to be scale invariant if there exists function 9k = fki^) 
such that 

y{y,e,k>0), x{ky,ek) = kx{y,d) (23) 

or in short 

y I— > X Vk > 0, fcy I— > kx. (24) 

In this paper, we focus only on priors which admit density laws. We define then the scale 
invariant property for those laws as follows: 

Definition 2 A probability density function Pu{u; 6) [resp., a conditional density Puiy{u\v;0),] is 
said to be scale invariant if there exists function Ok = fui^) such that 

y{u,e,k>0), pu{ku;ek)=k-^pu{u;e), (25) 

[resp., y{u,9,k>0), Pu\v{ku\kv;0k) = k~'^Pu\v{u\v;e),] 
where N = dim{u). 

If fk = Id, i.e.; if 6k = 9 then Pu{u;6) is said to be strictly scale invariant. 

The above property for density laws specifies that these laws are a part of a family of the laws 
which is closed relative to scale transformation. Thus, in this class, a set of pertinent parameters 
exists for each chosen scale. 

We need also to set two properties for scale invariant density laws. Both concern the conservation 
of the SIP, one after marginalization, the other after application of the Bayes rules. 
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Lemma 1 If px.y{x,y;6) is scale invariant, then the marginalized py{y; 9) is also scale invariant. 

Lemma 2 If px{x;\) and py\^{y\x\ij}) are scale invariant, then the joint law Px.y{x,y]X,%l}) is 
also scale invariant. 

Proofs are straightforward and are found in Appendix A. 

Using these two definitions, we prove the following theorem which summarizes sufficient condi- 
tions for an estimator to be scale invariant: 

Theorem 1 If the cost function C{x*,x) of a Bayesian estimator satisfies the condition: 

Vfc > 0, 3{ak e R, hk > 0) such that V(a;*, a;), C{xl,Xk) = at + bkC{x*,x), (26) 
and if the posterior law is scale invariant, i.e., there exists function Ok = fki^^) such that: 

yk>0,y{x,y), p{kx\ky-ek)^k-d^^^'')p{x\y-e), (27) 
then, the resulting Bayesian estimator is scale invariant, i.e., 

x{ky,ek) = kx{y,e). (28) 

See the appendix B for the proof. It is also shown there that the cost functions of the three 
classical Bayesian estimators, i.e.; MAP, PM and the MMAP, satisfy the first constraint. 

Remark: In this theorem, the SIP is applied to the posterior law p{x\y\ 0). However, we can sepa- 
rate the hyperparameters in two sets A and i/j, where A and ■0 are the parameters of the prior laws 
Px{x] A) and Pb{y — Ax; ip). In what follows, we want to make the choice ofpx and pb independent. 
From the lemma 1 and 2, if px and pb satisfy the SIP then the posterior p{x\y; 9) satisfies the SIP. 
As a consequence 9k must be separated according to 9k — [\kT'iPk\ — [9k{^)^hk{'4')]- 



4. Hyperparameters estimation 

In the above theorem, we assumed that the hyperparameters 9 are given. Thus, given the data y 
and the hyperparameters 9, we can calculate x. Now, if the scale factor k of the data has been 
changed, we have first to update the hyperparameters according to 9k = fkW^ ^^'^ then we 
can use the SIP: 

x{ky,9k) = kx{y,9). (29) 

Now, let us see what happens if we have to estimate both x and 9, either by Maximum or Generalized 
Maximum Likelihood. 

• Maximum likelihood (ML) method estimates first 9 by 

= argmax{L(0)} , (30) 
9 

where 

L{9)=p{y-9) (31) 
and then 9 is used to estimate a;. At a scale k, 

9k = argmax{ifc(^fc)} ■ (32) 
9k 

Application of lemma |^ implies that 

Lk{9k) = k'^^'^^y^L{9), (33) 
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thus, the Maximum Likelihood estimator satisfies the condition 

Ok = fu{0)- (34) 

The hkehhood function (eg. ^) has rarely an explicit form, and a common algorithm for its 
locally maximization is the EM algorithm which is an iterative algorithm described briefly as 
follows: 



At a scale fc, 



Q(0;0''^) = E ^^.,{\np{y\x■,0)] 
x\y.O 

9 = argmax<^g(0;0 ) 

6 I 



Qfc(0fc;^^) = E ^,.,{\np{ky\kx;ek)} 



(35) 



= -Aflnfc + E ^^.){\np{y\x■,e)} 
kx\ky-e^ 

= -Mlnfc + fc-dim(y)g(g.gW)_ ^^^^ 
Thus, if we initialize this iterative algorithm with the value S^. "* = fk{S^ ^), then we have 

Ok ^fkio )■ (37) 



Then the scale invariance coherence of hyperparameters is ensured during the optimization steps. 
• In Generalized Maximum Likelihood (GML) method, one estimates both 9 and x by 



9,x] — arg max {p{x, y; 9)} . (38) 
^ {9,x) 

Applying the same demonstration as above to the joint laws rather than to the marginalized one 
leads to 

'^fc,2fe) - (/fc(§),fcS) ■ (39) 

However, this holds if and only if the GML has a maximum. This may not be always the case 
and this is a major drawback in GML. Also, in GML method, direct resolution is rarely possible 
and sub-optimal techniques lead to the classical two-step estimation scheme: 

S*-*'' = argnmx ^^(a;, y; 0^ '')|' , (40) 

= arg I 

9 



0^ ■* = argmax |p(ir'*-', y; 0)| . (41) 



We see that, in each iteration, the 9 estimation step may be considered as the ML estimation 
of 9 if £c(*) is supposed to be a realization of the prior law. Thus the coherence of estimated 
hyperparameters at different scales is fulfilled during the both optimization steps, and 

^■■■',4"-'^) = f/.(^^"-'),fc-<"-'^V (42) 



Thus, if we consider the whole estimation problem (with a ML or GML approach), the SIP of 
the estimator is assured in both cases. It is also ensured during the iterative optimization schemes 
of ML or GML. 
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5. Markovian invariant distributions 

Markovian distributions as priors in image processing allow to introduce local characteristics and 
inter-pixels correlations. They are widely used but there exist many different Markovian models 
and very few model selection guidelines exist. In this section we apply the above scale invariance 
considerations to the prior model selection in the case of first order homogeneous MRFs. 

Let G il be a homogeneous Markov random field defined on the subset [1 . . . A^] x [1 . . . Af] of 
7? . The Markov characteristic property is: 

Vx{xi\xs-i) =Vx[xi\x5i), (43) 

where bi is the neighbourhood of site i, and S is the set of pixels. Hammersley-Clifford theorem for 
the first order neighbourhood reads: 

Vxix\\) oc exp -A ^ (\){xs.,Xr) , (44) 

where C is the clique set, and 4>{x, y) the clique potential. In most works ^, |l^, |2^, a simplified 
model is introduced under the form <^(a;,y) = (j){x — y). Here we keep a general point of view. 
Application of the scale invariance condition to the Markovian prior laws px{x, A) leads to the two 
following theorems: 

Theorem 2 A familly of Markovian distribution is scale invariant if and only if there exist two 
functions /(fc. A) and P{k) such that clique potential (j)(xs,Xr) satisfies: 

f(k, A) (f>{kxs,kxr) = X(t){xs,Xr) + (3{k). (45) 

Theorem 3 A necessary and sufficient condition for a Markov random fields to he scale invariant is 
that exists a triplet (a, 6, c) such as the clique potential 4>{xs, Xr) verifies the linear partial differential 
equation (PDE) : 

/ d(l){Xs,Xr) , d(j){Xs,Xry 

a<p[Xs,Xr) + Xs h Xr 7\ 

\ OXs OXr 

Finally, enforcing symmetry of the clique potentials (l){xs,Xr) — 4>{xr,Xs) the following theorem 
provides the set of scale invariant clique potentials: 

Theorem 4 px{x,\) is scale invariant if and only if chosen from one of the following 

vector spaces: 



Vo = s (t){xs,Xr) I ^ipi-) even and p G 





(log 












log 


Xs 







~ p\og \XsXr\ I (46) 
\XsXrA (47) 



Moreover, Vo is the subspace of strictly scale invariant clique potentials. 



For the proof of these theorems see 22 



Among the most common models in use for image processing purposes, only few clique potentials 
fall into the above set. Let us give two examples: 

First, the GGMRFs proposed by Bouman & Sauer |^ were built by a similar approach of scale 
invariance but under the restricted assumption that 4'{xs, Xr) = 4>{xs — Xr)- The yielded expression 
4>{xs,Xr) = \xs — XrY" cSuTi bc factorcd according to 0(a;s, Xr) = \xs Xr\^/'^\2s}i (log(a;s/a;r)/2)|P which 
shows that it falls in Vi(p). 

The second example of potential does not reduce to the single variable function 
4>{xs — Xr)- (t>{xs,Xr) = {xs — Xr) log [xs / Xr) ■ It has recently been introduced from I-divergence 
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penalty considerations in the field of image estimation problem (optic deconvolution) by O 'Sullivan 
p6t . Factoring |a;sa;i.|2 leads to: 

(l}{Xs,Xr) = \XsXr\'^^ {\og{Xs/Xr)) , (48) 

where ipiX) — 2Xsh(X/2) is even. It shows that (j){xs,Xr) is in Vi(l/2) and is scale invariant. As 
4>{xsTXr) is defined only on R^^ it applies to positive quantities. This feature is very useful in image 
processing where prior positivity applies to many physical quantities. 

6. Conclusions 

In this paper we have outlined and justified a weaker property than linearity that is desired for the 
Bayesian estimators to have. We have shown that this scale invariance property (SIP) helps to avoid 
an arbitrary choice for the scale of the measurement. Some models already employed in Bayesian 
estimation, including Markov prior Models 0, Entropic prior || and Generalized Gaussian 
models ||ll[, have demonstrated the existence and usefulness of scale invariant models. Then we 
have given general conditions for a Bayesian estimator to be scale invariant. This property holds 
for most Bayesian estimators such as MAP, PM, MMAP under the condition that the prior laws 
are also scale invariant. Thus, imposition of the SIP can assist in the model selection. We have also 
shown that classical hyperparameters estimation methods satisfy the SIP property for estimated 
laws. 

Finally we discussed how to choose the prior laws to obtain scale invariant Bayesian estimators. 
For this, we considered two cases: entropic prior laws and first-order Markov models. In related 
preceding works (l[ ||, |2^, the SIP constraints have been studied for the case of entropic prior laws. 
In this paper we extended that work to the case of first-order Markov models and showed that many 
common Markov models used in image processing are special cases. 

1. SIP property inheritance 

• Proof of the Lemma |l|: 

Let px,y{x, y; 0) have the scale invariance property, then if there exists 6k = fki^) such that 

Px,y{kx,ky;ek) = k~''^'^^'>Px.y{x,y]B), 
where N = dim(a;) and M = dim(y), then, marginalizing with respect to x, we obtain 

Pyiky; Ok) = fc-(™) Jp.Ax. y; e)k-''dx = k-^'py{y; 9), 

which completes the proof. 

• Proof of the Lemma |2|: 

The definition of SIP for density laws and direct application of the Bayes rule lead to 

Px,y{kx, ky; 9k) = k-^p4x; A) k-''pyi,{y\x; t/>) = k-^"'+'''>px^y{x, y; 6), 
which concludes the proof. 



2. SIP conditions for Bayesian estimator 
• Proof of the Theorem |l|: 

Since a Bayesian estimator is defined by 



S = argrmn|y C{x* ,x) p{x*\y;9) dx*^ , 
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then 



Xk — argmm 



k are: min 

X 



k ars: min 

X 




C{xlxk)p{xl\ky;ek)d{xl)^ 
I C{kx*,kx)p{kx*\ky;ek)k^ dx*'^ 
f[ak + bkC{x*,x)]k-'^pix* \y: 6) k^ dx* 



} 



which proves the Theorem ^ 

• Conditions for cost functions: 

The three classical Bayesian estimators, MAP, PM and MMAP, satisfy the condition of the cost 
function: 

— Posterior mean (PM): C{xl,Xk) ~ {xl - XkYQ {xl - Xk) — fc^ C{x*,x). 

— Maximum a posteriori (MAP): C{xl, Xk) = 1 — 5{x1, — Xk) = C{x* , x). 

— Marginal Maximum a Posteriori (MMAP): 
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